Skip to content

Inline deep #3316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 24, 2017
Merged

Inline deep #3316

merged 4 commits into from
Oct 24, 2017

Conversation

nicolasstucki
Copy link
Contributor

Methods that cannot be inlined due to depth (compiling dotty 30 times in a loop):

Collected with -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

Benchmarks with XX:MaxInlineLevel=9

[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 33786.645 ms/op
[info] # Warmup Iteration   2: 17443.817 ms/op
[info] # Warmup Iteration   3: 15673.122 ms/op
[info] # Warmup Iteration   4: 14657.183 ms/op
[info] # Warmup Iteration   5: 14269.043 ms/op
[info] # Warmup Iteration   6: 14695.653 ms/op
[info] # Warmup Iteration   7: 14047.060 ms/op
[info] # Warmup Iteration   8: 13985.261 ms/op
[info] # Warmup Iteration   9: 13947.841 ms/op
[info] # Warmup Iteration  10: 14382.364 ms/op
[info] Iteration   1: 13818.103 ms/op
[info] Iteration   2: 13930.734 ms/op
[info] Iteration   3: 13996.306 ms/op
[info] Iteration   4: 14177.422 ms/op
[info] Iteration   5: 13915.746 ms/op
[info] Iteration   6: 13768.716 ms/op
[info] Iteration   7: 13849.529 ms/op
[info] Iteration   8: 14156.749 ms/op
[info] Iteration   9: 13592.561 ms/op
[info] Iteration  10: 13648.460 ms/op
[info] Iteration  11: 13645.160 ms/op
[info] Iteration  12: 13986.800 ms/op
[info] Iteration  13: 13579.973 ms/op
[info] Iteration  14: 13595.698 ms/op
[info] Iteration  15: 13753.457 ms/op
[info] Iteration  16: 14118.314 ms/op
[info] Iteration  17: 13578.536 ms/op
[info] Iteration  18: 13624.389 ms/op
[info] Iteration  19: 13540.987 ms/op
[info] Iteration  20: 14298.178 ms/op
[info] 
[info] 
[info] Result "dotty.tools.benchmarks.Worker.compile":
[info]   13828.791 ±(99.9%) 203.119 ms/op [Average]
[info]   (min, avg, max) = (13540.987, 13828.791, 14298.178), stdev = 233.912
[info]   CI (99.9%): [13625.672, 14031.910] (assumes normal distribution)

Benchmarks with XX:MaxInlineLevel=35

[info] # Fork: 1 of 1
[info] # Warmup Iteration   1: 30287.934 ms/op
[info] # Warmup Iteration   2: 17105.325 ms/op
[info] # Warmup Iteration   3: 14658.572 ms/op
[info] # Warmup Iteration   4: 14571.616 ms/op
[info] # Warmup Iteration   5: 14009.009 ms/op
[info] # Warmup Iteration   6: 14326.850 ms/op
[info] # Warmup Iteration   7: 13719.054 ms/op
[info] # Warmup Iteration   8: 13633.771 ms/op
[info] # Warmup Iteration   9: 13579.682 ms/op
[info] # Warmup Iteration  10: 14067.770 ms/op
[info] Iteration   1: 13388.887 ms/op
[info] Iteration   2: 13424.087 ms/op
[info] Iteration   3: 13648.308 ms/op
[info] Iteration   4: 13995.807 ms/op
[info] Iteration   5: 13399.279 ms/op
[info] Iteration   6: 13477.962 ms/op
[info] Iteration   7: 13418.966 ms/op
[info] Iteration   8: 13907.012 ms/op
[info] Iteration   9: 13229.703 ms/op
[info] Iteration  10: 13326.266 ms/op
[info] Iteration  11: 13274.452 ms/op
[info] Iteration  12: 13684.152 ms/op
[info] Iteration  13: 13353.853 ms/op
[info] Iteration  14: 13381.231 ms/op
[info] Iteration  15: 13240.378 ms/op
[info] Iteration  16: 13720.528 ms/op
[info] Iteration  17: 13263.029 ms/op
[info] Iteration  18: 13207.083 ms/op
[info] Iteration  19: 13245.706 ms/op
[info] Iteration  20: 13590.566 ms/op
[info] 
[info] 
[info] Result "dotty.tools.benchmarks.Worker.compile":
[info]   13458.863 ±(99.9%) 198.145 ms/op [Average]
[info]   (min, avg, max) = (13207.083, 13458.863, 13995.807), stdev = 228.184
[info]   CI (99.9%): [13260.718, 13657.008] (assumes normal distribution)

Changing XX:MaxRecursiveInlineLevel from 1 (default) to two made the compiler a bit slower.

@nicolasstucki
Copy link
Contributor Author

We can currently not test the performance of this change on the benchmark server. @liufengyun will work on some changes that will make this possible.

@nicolasstucki
Copy link
Contributor Author

Other interesting inlining data for methods too big to inline https://gist.github.com/nicolasstucki/e239b7d331aa152c75259b4273f75201. Unfortunately nothing close to 35 bytes.

@nicolasstucki nicolasstucki requested a review from smarter October 12, 2017 15:09
@smarter
Copy link
Member

smarter commented Oct 12, 2017

I would try MaxInlineLevel=18 too.

@@ -37,7 +37,7 @@ object Bench {
val libs = System.getProperty("BENCH_CLASS_PATH")

val opts = new OptionsBuilder()
.jvmArgsPrepend("-Xbootclasspath/a:" + libs + ":", "-Xms2G", "-Xmx2G")
.jvmArgsPrepend("-Xbootclasspath/a:" + libs + ":", "-Xms2G", "-Xmx2G", "-XX:MaxInlineLevel=35")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for testing but we shouldn't merge this since it doesn't reflect what 99% of people using dotty through sbt will get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I there a way to add this to sbt?

@nicolasstucki
Copy link
Contributor Author

MaxInlineLevel=32 was not enough.

@retronym
Copy link
Member

retronym commented Oct 13, 2017

Unfortunately nothing close to 35 bytes.

I assume you're aware, but for the benefit of others reading this, it is worth noting that that 35 byte threshold (-XX:MaxInlineSize=35) is used in HotSpot to (almost) unconditionally inline a method into its caller. Howver, larger methods are still inlined according to profile data: methods smaller than -XX:FreqInlineSize=325 are inlined if called frequently enough.

From your examples:

dotty.tools.dotc.core.Types$$anon$5::apply (344 bytes)   hot method too big

...would have been eligible for inlining if it were smaller than 325 bytecodes. If the message is just "too big", it means it wasn't a hot call site, and the callee would need to be <= 35 bytes to be inlined.

-XX:MaxInlineLevel=9 controls the depth of stacks of inlining (our trait encoding runs into this more than we would like!).

@nicolasstucki
Copy link
Contributor Author

@retronym, I aware of the other flags but I have not tried them all yet. I tried XX:MaxInlineLevel because our tait also reaches this limit. Changing XX:MaxInlineSize had negative effects. I will try to change XX:FreqInlineSize to see what happens, thanks for the tip :)

@nicolasstucki nicolasstucki merged commit 2b58780 into scala:master Oct 24, 2017
@allanrenucci allanrenucci deleted the inline-deep branch October 24, 2017 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants